ESOH Risk Assessment

Overview

Environmental, Safety, and Occupational Health (ESOH) risks include:

Per DoDI 5000.02 (Operation of the Adaptive Acquisition Framework), paragraph 4.1.b.(6), "In consultation with the user representative, the PM will determine which environment, safety, and occupational health risks must be eliminated or mitigated, and which risks can be accepted."

Per DoDI 5000.85 (Major Capability Acquisition), paragraph 3C.3.d.(2), "The PM is responsible for integrating ESOH considerations into the decision-making process."

DoDI 5000.88 (Engineering of Defense Systems)paragraph 3.6.e. (System Safety) provides Lead Systems Engineer (LSE) responsibilities for (1) System Safety Engineering, (2) PESHE, (3) NEPA and E.O. 12114, (4) Mishap Investigation Support, and (5) System Safety in SEP (Systems Engineering Plan). The instruction directs the LSE to use the methodology in MIL-STD-882 to address ESOH risks associated with system-related hazards and the guidance identified in the DoD Joint Software Systems Safety Engineering Handbook to achieve an acceptable level of software system safety risk.

Many organizations use both the MIL-STD-882 and the DoD Risk Management processes to address ESOH related risks as such risks often have programmatic implications, as well.

System Safety Process

flowchart should 8 boxes connected in order. Element 1 - document the system safety approach. Element 2 - identify and document hazards. Element 3 - Assess and Document Risk. Element 4 - Identify and Document Risk Mitigation Measures. Element 5 - Reduce Risk. Element 6 - Verify, Validate, and Document Risk Reduction. Element 7 - Accept Risk and Document. Element 8 - Manage Life-Cycle Risk.
Eight elements of the system safety process. MIL-STD-882E, Figure 1, 27 Sep 2023

Note that the eight elements of the system safety process map to the five steps of the DoD Risk Management Process. The description of each element below is adapted, sometimes verbatim, from MIL-STD-882E, paragraphs 4.3.1 – 4.3.8.

Element 1: Document System Safety Approach

Element 1 (Document the System Safety Approach) maps to Risk Process Planning. The PM and contractor shall document the system safety approach for managing hazards as an integral part of the SE process. The minimum requirements for the approach include:

  1. Describing the risk management effort and how the program is integrating risk management into the SE process, the Integrated Product and Process Development process, and the overall program management structure.
  2. Identifying and documenting the prescribed and derived requirements applicable to the system. Examples include Insensitive Munitions (IM) requirements, Electromagnetic Environmental Effects (E3) requirements, Civilian Harm Mitigation and Response (CHMR) requirements, pollution prevention mandates, design requirements, technology considerations, and occupational and community noise standards. Once the requirements are identified, ensure their inclusion in the system specifications and the flow-down of applicable requirements to subcontractors, vendors, and suppliers.
  3. Defining how hazards and associated risks are formally accepted by the appropriate risk acceptance authority and concurred with by the user representative in accordance with applicable DoDI 5000 series.
  4. Documenting hazards with a closed-loop Hazard Tracking System (HTS). The HTS will include, as a minimum, the following data elements: identified hazards, associated mishaps, risk assessments (initial, target, event(s)), identified risk mitigation measures, selected mitigation measures, hazard status, verification of risk reductions, and risk acceptances. Both the contractor and Government shall have access to the HTS with appropriate controls on data management.

The Government shall receive and retain “government purpose rights” of all the data recorded in the HTS and any other items (i.e., studies, analyses, test data, notes or similar data) generated in the performance of the contract with respect to the HTS.

Element 2: Identify and Document Hazards

Element 2 (Identify and Document Hazards) maps to Risk Identification. Hazards are identified through a systematic analysis process that includes system hardware and software, system interfaces (to include human interfaces), and the intended use or application and operational environment. Consider and use mishap data; relevant environmental and occupational health data; user physical characteristics; user knowledge, skills, and abilities; and lessons learned from legacy and similar systems. The hazard identification process shall consider the entire system lifecycle and potential impacts to personnel, infrastructure, defense systems, the public, and the environment. Identified hazards shall be documented in the Hazard Tracking System (HTS).

Element 3: Assess and Document Risk

Element 3 (Assess and Document Risk) maps to Risk Analysis. The severity category and probability level of the potential mishap(s) for each hazard across all system modes are assessed using the definitions in Tables I (Severity Categories) and II (Probability Levels) in MIL-STD-882E. However, unlike the Consequence definitions in the DoD Risk, Issue, and Opportunity Management Guide for Defense Acquisition Programs, the definitions of the Severity Categories in Table I should not be tailored for each system/program unless approval is obtained in accordance with DoD Component policy. For Probability Levels, qualitative definitions are provided in Table II and can be used in case appropriate and representative quantitative data that defines frequency or rate of occurrence for the hazard are not available.

  1. To determine the appropriate severity category as defined in Table I for a given hazard at a given point in time, identify the potential for death or injury, environmental impact, or monetary loss. A given hazard may have the potential to affect one or all of these three areas.

    Table I: MIL-STD-882E Severity Categories Table

    Description Severity Category Mishap Result Criteria
    Catastrophic 1 Could result in one or more of the following: death, permanent total disability, irreversible significant environmental impact, or monetary loss equal to or exceeding $10M.
    Critical 2 Could result in one or more of the following: permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, reversible significant environmental impact, or monetary loss equal to or exceeding $1M but less than $10M.
    Marginal 3 Could result in one or more of the following: injury or occupational illness resulting in one or more lost work days(s), reversible moderate environmental impact, or monetary loss equal to or exceeding $100K but less than $1M.
    Negligible 4 Could result in one or more of the following: injury or occupational illness not resulting in a lost work day, minimal environmental impact, or monetary loss less than $100K.

    Source: MIL-STD-882E 27 Sep 2023

  2. B. To determine the appropriate probability level as defined in Table II for a given hazard at a given point in time, assess the likelihood of occurrence of a mishap. Probability level F is used to document cases where the hazard is no longer present. No amount of doctrine, training, warning, caution, or Personal Protective Equipment (PPE) can move a mishap probability to level F.

    Table II: MIL-STD-882E Probability Levels Table

    Description Level Specific Individual Item Fleet or Inventory
    Frequent A Likely to occur often in the life of an item Continuously experienced
    Probable B Will occur several times in the life of an item Will occur frequently
    Occasional C Likely to occur sometime in the life of an item Will occur several times
    Remote D Unlikely, but possible to occur in the life of an item Unlikely, but can reasonably be expected to occur
    Improbable E So unlikely, it can be assumed occurrence may not be experienced in the life of an item Unlikely to occur, but possible
    Eliminated F Incapable of occurrence. This level is used when potential hazards are identified and later eliminated. Incapable of occurrence. This level is used when potential hazards are identified and later eliminated.

    Source: MIL-STD-882E 27 Sep 2023

    1. When available, the use of appropriate and representative quantitative data that defines frequency or rate of occurrence for the hazard, is generally preferable to qualitative analysis. The Improbable level is generally considered to be less than one in a million. See Appendix A of MIL-STD-882E for an example of quantitative probability levels.
    2. In the absence of such quantitative frequency or rate data, reliance upon the qualitative text descriptions in Table II is necessary and appropriate.
  3. Assessed risks are expressed as a Risk Assessment Code (RAC) which is a combination of one severity category and one probability level. For example, a RAC of 1A is the combination of a Catastrophic severity category and a Frequent probability level. Table III assigns a risk level of High, Serious, Medium, or Low for each RAC.

    Table III: MIL-STD-882E Risk Assessment Matrix

    Severity
    Catastrophic Critical Marginal Negligible

    Probability

    Frequent (A) High High Serious Medium
    Probable (B) High High Serious Medium
    Occasional (C) High Serious Medium Low
    Remote (D) Serious Medium Medium Low
    Improbable (E) Medium Medium Medium Low
    Eliminated (F) Eliminated

    Source: MIL-STD-882E 27 Sep 2023

  4. The definitions in Tables I and II, and the RACs in Table III shall be used, unless tailored alternative definitions and/or a tailored matrix are formally approved in accordance with DoD Component policy. Alternates shall be derived from Tables I through III.
  5. The Program shall document all numerical definitions of probability used in risk assessments as required by 4.3.1. Assessed risks shall be documented in the HTS.

Elements 4 - 7 (map to Risk Mitigation)

Element 4 (Identify and Document Risk Mitigation Measures). Potential risk mitigation(s) shall be identified, and the expected risk reduction(s) of the alternative(s) shall be estimated and documented in the HTS. The goal should always be to eliminate the hazard if possible. When a hazard cannot be eliminated, the associated risk should be reduced to the lowest acceptable level within the constraints of cost, schedule, and performance by applying the system safety design order of precedence. The system safety design order of precedence below identifies alternative mitigation approaches and lists them in order of decreasing effectiveness.

  1. Eliminate hazards through design selection. Ideally, the hazard should be eliminated by selecting a design or material alternative that removes the hazard altogether.
  2. Reduce risk through design alteration. If adopting an alternative design change or material to eliminate the hazard is not feasible, consider design changes that reduce the severity and/or the probability of the mishap potential caused by the hazard(s).
  3. Incorporate engineered features or devices. If mitigation of the risk through design alteration is not feasible, reduce the severity or the probability of the mishap potential caused by the hazard(s) using engineered features or devices. In general, engineered features actively interrupt the mishap sequence and devices reduce the risk of a mishap.
  4. Provide warning devices. If engineered features and devices are not feasible or do not adequately lower the severity or probability of the mishap potential caused by the hazard, include detection and warning systems to alert personnel to the presence of a hazardous condition or occurrence of a hazardous event.
  5. Incorporate signage, procedures, training, and personal protective equipment (PPE). Where design alternatives, design changes, and engineered features and devices are not feasible and warning devices cannot adequately mitigate the severity or probability of the mishap potential caused by the hazard, incorporate signage, procedures, training, and PPE. Signage includes placards, labels, signs and other visual graphics. Procedures and training should include appropriate warnings and cautions. Procedures may prescribe the use of PPE. For hazards assigned Catastrophic or Critical mishap severity categories, the use of signage, procedures, training, and PPE as the only risk reduction method should be avoided.

Element 5: Reduce Risk

Mitigation measures are selected and implemented to achieve an acceptable risk level. Consider and evaluate the cost, feasibility, and effectiveness of candidate mitigation methods as part of the SE and Integrated Product Team (IPT) processes. Present the current hazards, their associated severity and probability assessments, and status of risk reduction efforts at technical reviews.

Element 6: Verify, Validate, and Document Risk Reduction

Verify the implementation and validate the effectiveness of all selected risk mitigation measures through appropriate analysis, testing, demonstration, or inspection. Document the verification and validation in the HTS.

Element 7: Accept Risk and Document

Before exposing people, equipment, or the environment to known system-related hazards, the risks shall be accepted by the appropriate authority as defined in DoDI 5000.88, paragraph 3.6.e.(1)(b)1: the Component or Defense Acquisition Executive for high risks, Program Executive Officer-level for serious risks, and the PM for medium and low risks. The system configuration and associated documentation that supports the formal risk acceptance decision shall be provided to the Government for retention through the life of the system. The definitions in Tables I and II, the RACs in Table III, and the criteria in Table VI for software (see paragraph 4.4 (Software Contribution to System Risk) of MIL-STD-882E) shall be used to define the risks at the time of the acceptance decision, unless tailored alternative definitions and/or a tailored matrix are formally approved in accordance with DoD Component policy. The user representative shall be part of this process throughout the life-cycle of the system and shall provide formal concurrence before all Serious and High risk acceptance decisions.

After fielding, data from mishap reports, user feedback, and experience with similar systems or other sources may reveal new hazards or demonstrate that the risk for a known hazard is higher or lower than previously recognized. In these cases, the revised risk shall be accepted in accordance with DoDI 5000.88, paragraph 3.6.e.(1)(b)1. NOTE: A single system may require multiple event risk assessments and acceptances throughout its life-cycle. Each risk acceptance decision shall be documented in the HTS.

Element 8: Manage Life-Cycle Risk

Element 8 maps to Risk Monitoring. After the system is fielded, the system program office uses the system safety process to identify hazards and maintain the HTS throughout the system’s life-cycle. This life-cycle effort considers any changes to include, but not limited to, the interfaces, users, hardware and software, mishap data, mission(s) or profile(s), and system health data. Procedures shall be in place to ensure risk management personnel are aware of these changes, e.g., by being part of the configuration control process. The program office and user community shall maintain effective communications to collaborate, identify, and manage new hazards and modified risks.

If a new hazard is discovered or a known hazard is determined to have a higher risk level than previously assessed, the new or revised risk will need to be formally accepted in accordance with DoDI 5000.88, paragraph 3.6.e.(1)(b)1. In addition, DoD requires program offices to support system-related Class A and B (as defined in DoDI 6055.07) mishap investigations by providing analyses of hazards that contributed to the mishap and recommendations for materiel risk mitigation measures, especially those that minimize human errors.

Software Contribution to System Risk

The following discussion is adapted from MIL-STD-882E, Section 4.4, Software contribution to system risk.

Software contribution to system risk. The assessment of risk for software, and consequently software-controlled or software-intensive systems, cannot rely solely on the risk severity and probability. Determining the probability of failure of a single software function is difficult at best and cannot be based on historical data. Software is generally application-specific and reliability parameters associated with it cannot be estimated in the same manner as hardware. Therefore, another approach shall be used for the assessment of software’s contributions to system risk that considers the potential risk severity and the degree of control that software exercises over the hardware.

Software assessments. Tables IV through VI shall be used, unless tailored alternative matrices are formally approved in accordance with DoD Component policy. The degree of software control is defined using the Software Control Categories (SCC) in Table IV (or approved tailored alternative). Table V provides the Software Safety Criticality Matrix (SSCM) based on Table I severity categories (or approved tailored severity categories) and Table IV SCCs. The SSCM establishes the Software Criticality Indices (SwCIs) used to define the required LOR tasks. Table VI provides the relationship between the SwCI, the LOR tasks, and how not meeting the LOR task requirements affects software’s contribution to risk.

Table IV. Software Control Categories

Level Name Description
1 Autonomous (AT)

Software functionality that exercises autonomous control authority over potentially safety-significant hardware systems, subsystems, or components without the possibility of predetermined safe detection and intervention by a control entity to preclude the occurrence of a mishap or hazard. (This definition includes complex system/software functionality with multiple subsystems, interacting parallel processors, multiple interfaces, and safety-critical functions that are time critical.)

2 Semi-Autonomous (SAT)
  • Software functionality that exercises control authority over potentially safety-significant hardware systems, subsystems, or components, allowing time for predetermined safe detection and intervention by independent safety mechanisms to mitigate or control the mishap or hazard. (This definition includes the control of moderately complex system/software functionality, no parallel processing, or few interfaces, but other safety systems/mechanisms can partially mitigate. System and software fault detection and annunciation notifies the control entity of the need for required safety actions.)
  • Software item that displays safety-significant information requiring immediate operator entity to execute a predetermined action for mitigation or control over a mishap or hazard. Software exception, failure, fault, or delay will allow, or fail to prevent, mishap occurrence. (This definition assumes that the safety-critical display information may be time-critical, but the time available does not exceed the time required for adequate control entity response and hazard control.)
3 Redundant Fault Tolerant (RFT)
  • Software functionality that issues commands over safety-significant hardware systems, subsystems, or components requiring a control entity to complete the command function. The system detection and functional reaction includes redundant, independent fault tolerant mechanisms for each defined hazardous condition. (This definition assumes that there is adequate fault detection, annunciation, tolerance, and system recovery to prevent the hazard occurrence if software fails, malfunctions, or degrades. There are redundant sources of safety-significant information, and mitigating functionality can respond within any time-critical period.)
  • Software that generates information of a safety-critical nature used to make critical decisions. The system includes several redundant, independent fault tolerant mechanisms for each hazardous condition, detection, and display.
4 Influential Software generates information of a safety-related nature used to make decisions by the operator but does not require operator action to avoid a mishap.
5 No Safety Impact (NSI) Software functionality that does not possess command or control authority over safety-significant hardware systems, subsystems, or components and does not provide safety-significant information. Software does not provide safety-significant or time sensitive data or information that requires control entity interaction. Software does not transport or resolve communication of safety-significant time sensitive data.

Software Safety Criticality Matrix. The SSCM (Table V) uses Table I severity categories for the columns and Table IV software control categories for the rows. Table V assigns software criticality index (SwCI) numbers to each cross-referenced block of the matrix. The SSCM shall define the level of rigor (LOR) tasks associated with the specific SwCI. Although it is similar in appearance to the Risk Assessment Matrix (Table III), the SSCM is not an assessment of risk.

Table V. Software Safety Criticality Matrix (SSCM)

SEVERITY CATEGORY
SOFTWARE CONTROL CATEGORY Catastrophic (1) Critical (2) Marginal (3) Negligible (4)
1 SwCI 1 SwCI 1 SwCI 3 SwCI 4
2 SwCI 1 SwCI 2 SwCI 3 SwCI 4
3 SwCI 2 SwCI 3 SwCI 4 SwCI 4
4 SwCI 3 SwCI 4 SwCI 4 SwCI 4
5 SwCI 5 SwCI 5 SwCI 5 SwCI 5

The LOR tasks associated with each SwCI are the minimum set of tasks required to assess the software contributions to the system-level risk.


SwCI Level of Rigor Tasks
SwCI 1 Program shall perform analysis of requirements, architecture, design, and code; and conduct in-depth safety-specific testing.
SwCI 2 Program shall perform analysis of requirements, architecture, and design; and conduct in-depth safety-specific testing.
SwCI 3 Program shall perform analysis of requirements and architecture; and conduct in-depth safety-specific testing.
SwCI 4 Program shall conduct safety-specific testing.
SwCI 5 Once assessed by safety engineering as Not Safety, then no safety specific analysis or verification is required.

NOTE: Consult the Joint Software Systems Safety Engineering Handbook and AOP 52 for additional guidance on how to conduct required software analyses.


Assessment of software contribution to risk. All software contributions to system risk, including any results of Table VI application, shall be documented in the HTS.


Table VI. Relationship between SwCI, risk level, LOR tasks, and risk

Software Criticality Index (SwCI) Risk Level Software LOR Tasks and Risk Assessment/Acceptance
SwCI 1 High If SwCI 1 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as HIGH and provided to the PM for decision. The PM shall document the decision of whether to expand the resources required to implement SwCI 1 LOR tasks or prepare a formal risk assessment for acceptance of a HIGH risk.
SwCI 2 Serious If SwCI 2 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as SERIOUS and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 2 LOR tasks or prepare a formal risk assessment for acceptance of a SERIOUS risk.
SwCI 3 Medium If SwCI 3 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as MEDIUM and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 3 LOR tasks or prepare a formal risk assessment for acceptance of a MEDIUM risk.
SwCI 4 Low If SwCI 4 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as LOW and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 4 LOR tasks or prepare a formal risk assessment of acceptance of a LOW risk.
SwCI 5 Not Safety No safety-specific analyses or testing is required.
  1. The Table V LOR tasks shall be performed to assess the software contributions to the system-level risk. Results of the LOR tasks provide a level of confidence in safety-significant software and document causal factors and hazards that may require mitigation. Results of the LOR tasks shall be included in the risk management process. Appendix B of MIL-STD-882E provides an example of how to assign a risk level to software contributions to system risk identified by completing the LOR analysis.
  2. If the required LOR tasks are not performed, then the system risk(s) contributions associated with unspecified or incomplete LOR tasks shall be documented according to Table VI. Table VI depicts the relationship between SwCI, risk levels, completion of LOR tasks, and risk assessment.
  3. All software contributions to system risk, including any results of Table VI application, shall be documented in the HTS. Perform risk acceptance in accordance with the risk management process outlined in DoDI 5000.85, paragraph 3C.3.d.

Resources